{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 3. Structure standardization\n",
    "\n",
    "(c) 2019, Dr. Ramil Nugmanov; Dr. Timur Madzhidov; Ravil Mukhametgaleev\n",
    "\n",
    "Installation instructions of CGRtools package information and tutorial's files see on `https://github.com/cimm-kzn/CGRtools`\n",
    "\n",
    "NOTE: Tutorial should be performed sequentially from the start. Random cell running will lead to unexpected results. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pkg_resources\n",
    "if pkg_resources.get_distribution('CGRtools').version.split('.')[:2] != ['3', '1']:\n",
    "    print('WARNING. Tutorial was tested on 3.1 version of CGRtools')\n",
    "else:\n",
    "    print('Welcome!')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# load data for tutorial\n",
    "from pickle import load\n",
    "from traceback import format_exc\n",
    "\n",
    "with open('molecules.dat', 'rb') as f:\n",
    "    molecules = load(f) # list of MoleculeContainer objects\n",
    "with open('reactions.dat', 'rb') as f:\n",
    "    reactions = load(f) # list of ReactionContainer objects\n",
    "\n",
    "m1, m2, m3, m4 = molecules # molecule\n",
    "m12 = m3.copy()\n",
    "r1 = reactions[0] # reaction\n",
    "\n",
    "m1.reset_query_marks()\n",
    "m1.atom(1).isotope = 16\n",
    "m1.flush_cache()\n",
    "\n",
    "m1.delete_atom(3)\n",
    "m1.atom_implicit_h(1)\n",
    "m1.atom_explicit_h(1)\n",
    "m1.atom_total_h(1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.1. Molecules\n",
    "\n",
    "MoleculeContainer has `standardize` and `aromatize` methods.\n",
    "\n",
    "Method `aromatize` transforms Kekule representation of rings into aromatized\n",
    "\n",
    "Method `standardize` applies functional group standardization rules to molecules. The following rules are implemented (corresponding SMARTS are given):\n",
    "\n",
    "    • Aromatic N-Oxide \t[#7;a:1]=[O:2]>>[#7+:1]-[#8-:2]\n",
    "    • Azide \t\t\t[#7;A;X2-:1][N;X2+:2]#[N;X1:3]>>[#7:1]=[N+:2]=[#7-:3]\n",
    "    • Diazo  \t\t\t[#6;X3-:1][N;X2+:2]#[N;X1:3]>>[#6;A:1]=[N+:2]=[#7-:3]\n",
    "    • Diazonium  \t\t[#6]-[#7:1]=[#7+:2]>>[#6][N+:1]#[N:2]\n",
    "    • Iminium  \t\t[#6;X3+:1]-[#7;X3:2]>>[#6;A:1]=[#7+:2]\n",
    "    • Isocyanate  \t\t[#7+:1][#6;A-:2]=[O:3]>>[#7:1]=[C:2]=[O:3]\n",
    "    • Nitrilium  \t\t[#6;A;X2+:1]=[#7;X2:2]>>[C:1]#[N+:2]\n",
    "    • Nitro  \t\t\t[O:3]=[N:1]=[O:2]>>[#8-:2]-[#7+:1]=[O:3]\n",
    "    • Nitrone Nitronate \t[#6;A]=[N:1]=[O:2]>>[#8-:2]-[#7+:1]=[#6;A]\n",
    "    • Nitroso  \t\t[#6]-[#7H2+:1]-[#8;X1-:2]>>[#6]-[#7:1]=[O:2]\n",
    "    • Phosphonic  \t\t[#6][P+:1]([#8;X2])([#8;X2])[#8-:2]>>[#6][P:1]([#8])([#8])=[O:2]\n",
    "    • Phosphonium Ylide  \t[#6][P-:1]([#6])([#6])[#6+:2]>>[#6][P:1]([#6])([#6])=[#6;A:2]\n",
    "    • Selenite  \t\t[#8;X2][Se+:1]([#8;X2])[#8-:2]>>[#8][Se:1]([#8])=[O:2]\n",
    "    • Silicate  \t\t[#8;X2]-[#14+:1](-[#8;X2])-[#8-:2]>>[#8]-[#14:1](-[#8])=[O:2]\n",
    "    • Sulfine  \t\t[#6]-[#6](-[#6])=[S+:1][#8-:2]>>[#6]-[#6](-[#6])=[S:1]=[O:2]\n",
    "    • Sulfon  \t\t\t[#6][S;X3+:1]([#6])[#8-:2]>>[#6][S:1]([#6])=[O:2]\n",
    "    • Sulfonium Ylide  \t[#6][S-:1]([#6])[#6+:2]>>[#6][S:1]([#6])=[#6;A:2]\n",
    "    • Sulfoxide  \t\t[#6][S+:1]([#6])([#8-:2])=O>>[#6][S:1]([#6])(=[O:2])=O\n",
    "    • Sulfoxonium Ylide  \t[#6][S+:1]([#6])([#8-:2])=[#6;A]>>[#6][S:1]([#6])(=[#6;A])=[O:2]\n",
    "    • Tertiary N-Oxide  \t[#6]-[#7;X4:1]=[O:2]>>[#6]-[#7+:1]-[#8-:2]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "m12 # molecule with kekulized ring"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "m12.aromatize() # aromatizes and returns number of transformed rings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "m12 # cleaned structure. Cache is flushed automatically"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "m12.standardize()  # apply standardization. Returns number of transformed groups"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "m12"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Molecules has `explicify_hydrogens` and `implicify_hydrogens` methods to handle hydrogens.\n",
    "\n",
    "This methods is used to add or remove hydrogens in molecule.\n",
    "\n",
    "Note that currently for pyrole-like molecules implicit hydrogens atoms are calculated incorrectly "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "m1.explicify_hydrogens() # return number of added hydrogens"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "m1 # for added hydrogen atoms coordinates are not calculated. Thus, it looks like hydrogen has the same position on image"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "m1.implicify_hydrogens()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "m1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "CGRtools has experimental algorithm for 2d geometry calcultaion. It works fine only for small molecules. Algorithm requires numpy and scipy packages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "m1.explicify_hydrogens() # add explicit hydrogens\n",
    "m1.calculate2d() # experimental force field-based 2d geometry calculation.\n",
    "m1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.2. Reactions standardization\n",
    "ReactionContainer has `standardize`, `aromatize`, `explicify_hydrogens` and `implicify_hydrogens` methods that can be applied to reactions. In this case they are applied to all molecules in reaction."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "reactions[2]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "reactions[2].standardize()\n",
    "reactions[2].explicify_hydrogens()\n",
    "reactions[2]"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "cgrtools",
   "language": "python",
   "name": "cgrtools"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
